Text - Independent Speaker Identi cation Basedon Small Training Data and Fast

نویسنده

  • Chung-Hsien Wu
چکیده

In this paper, a hierarchical approach to text-independent speaker identiication based on small training data and fast search algorithms is proposed. In this system, driven by the special properties of Mandarin speech, speech signals were segmented into phoneme-like units. For all Mandarin syllables, these phoneme-like units were suitably clustered. Therefore, only 75 syllables, instead of 408 syllables, were needed to train the identiication system. Also, the line spectrum pair (LSP) frequencies were used as the identiication feature. In the training process, a fast K-means algorithm is proposed to reduce the time for vector quantization(VQ). In the identiication process, a three-level hierarchical identiication scheme is proposed to improve the identiication performance. A speaker candidate selector in the rst level and a cluster candidate selector in the second level were proposed to reduce the identiication time. In the third level, a lateral inhibition Gaussian (LIG) network is proposed to give better discrimination among speakers. In the experiments, using a codebook size of 128 vectors for each VQ codebook, an average identiication rate of 91:1% for a token length from 1 to 10 among a population of the 36 speakers (27 male, 9 female) was obtained. As for the experiments on speed, the fast K-means algorithm took about half the time compared to the K-means algorithm, and the speaker candidate selector and cluster candidate selector saved about two thirds the total identiication time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selective use of the speech spectrum and a VQGMM method for speaker identification

This paper describes two separate sets of speaker identi cation experiments. In the rst set of experiments, the speech spectrum is selectively used for speaker identi cation. The results show that the higher portion of the speech spectrum contains more reliable idiosyncratic information on speakers than does the lower portion of equal bandwidth. In the second set of experiments, a vector-quanti...

متن کامل

Using maximum likelihood linear regression for segment clustering and speaker identification

Many adaptation scenarios rely on clustering of either the test or training data. Although consistency between the clustering and adaptation objective functions is desired, most previous approaches have not implemented such consistency. This paper shows that the statistics used in Maximum Likelihood Linear Regression (MLLR) adaptation are su cient to cluster data with a consistent Maximum Likel...

متن کامل

Audio-visual speaker recognition for video broadcast news: some fusion techniques

Audio-based speaker identi cation degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identi cation with audio-based speaker identi cation to improve the performance under mismatched conditions. Speci cally, we explore techniques to optimally determine the relativ...

متن کامل

Speaker verification with limited enrollment data

New methods for speaker veri cation that address the problems of limited training data and unknown telephone channel are presented. We describe a system for studying the feasibility of telephone based voice signatures for electronic documents that uses speaker veri cation with a xed test phrase but very limited data for training speaker models. We examine three methods for speaker veri cation t...

متن کامل

Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch

We compare speaker recognition performance of Vector Quantization (VQ), Gaussian Mixture Modeling (GMM) and the Arithmetic Harmonic Sphericity measure (AHS) in adverse telephone speech conditions. The aim is to address the question: how do multimodal VQ and GMM typically compare to the simpler unimodal AHS for matched and mismatched training and testing environments. We study identi cation (clo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007